Enhancing the Arabic Treebank: a Collaborative Effort toward New Annotation Guidelines
نویسندگان
چکیده
The Arabic Treebank team at the Linguistic Data Consortium has significantly revised and enhanced its annotation guidelines and procedure over the past year. Improvements were made to both the morphological and syntactic annotation guidelines, and annotators were trained in the new guidelines, focusing on areas of low inter-annotator agreement. The revised guidelines are now being applied in annotation production, and the combination of the revised guidelines and a period of intensive annotator training has raised inter-annotator agreement f-measure scores already and has also improved parsing results.
منابع مشابه
Enhanced Annotation and Parsing of the Arabic Treebank
The Arabic Treebank at the Linguistic Data Consortium has significantly revised and enhanced its annotation guidelines and annotation procedure over the past year. The revised syntactic guidelines are now being applied in annotation production, and the combination of the revised guidelines and a period of intensive annotator training has raised inter-annotator agreement f-measure scores already...
متن کاملSyntactic Annotation Guidelines for the Quranic Arabic Dependency Treebank
The Quranic Arabic Dependency Treebank (QADT) is part of the Quranic Arabic Corpus (http://corpus.quran.com), an online linguistic resource organized by the University of Leeds, and developed through online collaborative annotation. The website has become a popular study resource for Arabic and the Quran, and is now used by over 1,500 researchers and students daily. This paper presents the tree...
متن کاملCreating a Methodology for Large-Scale Correction of Treebank Annotation: The Case of the Arabic Treebank
The LDC Arabic Treebank team has significantly revised and enhanced its annotation guidelines and annotation procedures over the last two years, with the goal of reducing inconsistency in annotation in the Treebank. We have now completed automatic and significant manual revisions to 738,845 tokens/words in total, bringing them into line as far as possible with the new annotation guidelines and ...
متن کاملThe Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus
From our three year experience of developing a large-scale corpus of annotated Arabic text, our paper will address the following: (a) review pertinent Arabic language issues as they relate to methodology choices, (b) explain our choice to use the Penn English Treebank style of guidelines, (requiring the Arabic-speaking annotators to deal with a new grammatical system) rather than doing the anno...
متن کاملDeveloping An Arabic Treebank: Methods, Guidelines, Procedures, And Tools
In this paper we address the following questions from our experience of the last two and a half years in developing a large-scale corpus of Arabic text annotated for morphological information, part-of-speech, English gloss, and syntactic structure: (a) How did we ‘leapfrog’ through the stumbling blocks of both methodology and training in setting up the Penn Arabic Treebank (ATB) annotation? (b)...
متن کامل